Semi-Automated Identification of Ontological Labels in the Biomedical Literature with goldi
نویسندگان
چکیده
Recent growth in both the scale and the scope of large publicly available ontologies has spurred the development of computational methodologies which can leverage structured information to answer important questions. However, ontological labels, or “terms” have thus far proved difficult to use in practice; text mining, one crucial aspect of electronically understanding and parsing the biomedical literature, has historically had difficulty identifying “terms” in literature. In this article, we present goldi, an open source R package whose goal it is to identify terms of variable length in free form text. It is available at https://github.com/Chris1221/goldi or through CRAN. The algorithm works through identifying words or synonyms of words present in individual terms and comparing the number of present words to an acceptance function for decision making. In this article we present the theoretical rationale behind the algorithm, as well as practical advice for its usage applied to Gene Ontology term identification and quantification. We additionally detail the options available and describe their respective computational efficiencies.
منابع مشابه
Cost Function Modelling for Semi-automated SC, RTG and Automated and Semi-automated RMG Container Yard Operating Systems
This study analyses the concept of cost functions for semi-automated Straddle Carrier (SC), Rubber Tyred Gantry (RTG) and automated Rail Mounted Gantry (RMG) container yard operating cranes. It develops a generic cost based model for a pair-wise comparison, analysis and evaluation of economic efficiency and effectiveness of container yard equipment to be used for decision-making by terminal pla...
متن کاملOntological labels for automated location of anatomical shape differences
A method for automated location of shape differences in diseased anatomical structures via high resolution biomedical atlases annotated with labels from formal ontologies is described. In particular, a high resolution magnetic resonance image of the myocardium of the human left ventricle was segmented and annotated with structural terms from an extracted subset of the Foundational Model of Anat...
متن کاملA Semi-Automated Algorithm for Segmentation of the Left Atrial Appendage Landing Zone: Application in Left Atrial Appendage Occlusion Procedures
Background: Mechanical occlusion of the Left atrial appendage (LAA) using a purpose-built device has emerged as an effective prophylactic treatment in patients with atrial fibrillation at risk of stroke and a contraindication for anticoagulation. A crucial step in procedural planning is the choice of the device size. This is currently based on the manual analysis of the “Device Landing Zone” fr...
متن کاملSemi-Automated Semantic Annotation of the Biomedical Literature
Semantic annotations are a core enabler for efficient retrieval of relevant information in the life sciences as well in other disciplines. The biomedical literature is a major source of knowledge, which however is underutilized due to the lack of rich annotations that would allow automated knowledge discovery. We briefly describe the results of the SASEBio project (Semi Automated Semantic Enric...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کامل